PalQuant: Accelerating High-Precision Networks on Low-Precision Accelerators

نویسندگان

چکیده

AbstractRecently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the quantized models on these DLAs bring severe accuracy degradation. One way achieve both high efficient inference is deploy high-precision neural networks DLAs, which rarely studied. In this paper, we propose PArallel Low-precision Quantization (PalQuant) method that approximates computations via parallel representations from scratch. addition, present a novel cyclic shuffle module boost cross-group information communication between groups. Extensive experiments demonstrate PalQuant has superior performance state-of-the-art quantization methods speed, e.g., for ResNet-18 network quantization, can obtain 0.52% higher 1.78\(\times \) speedup simultaneously over 4-bit counter-part 2-bit accelerator. Code available at https://github.com/huqinghao/PalQuant.KeywordsQuantizationNetwork accelerationCNNs

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Convolutional Neural Networks Using Low Precision Arithmetic

e recent trend in convolutional neural networks (CNN)[2] is to have deeper multilayered structures. While this improves the accuracy of the model, the amount of computation and the amount of data involved in learning and inference increases. In order to solve this problem, several techniques have been proposed to reduce the amount of data and the amount of computation by lowering the numerical...

متن کامل

High-Accuracy Low-Precision Training

Low-precision computation is often used to lower the time and energy cost of machine learning, and recently hardware accelerators have been developed to support it. Still, it has been used primarily for inference—not training. Previous low-precision training algorithms suffered from a fundamental tradeoff: as the number of bits of precision is lowered, quantization noise is added to the model, ...

متن کامل

Brain circuit implementation: High-precision computation from low-precision components

Introduction Attempts to understand, let alone augment or supplant, the operation of brain circuitry rely not only on our knowledge of isolated neuron, circuit, and brain slice behavior but also on the impressive computations achieved by assemblies of these components. The variability of the constituent elements of these circuits suggests that they are arranged to operate in ways not obvious fr...

متن کامل

High Precision Survey and Alignment of Large Linear Accelerators

Future linear accelerators require new survey techniques to achieve the necessary alignment precision. For TESLA, the demanded accuracy for the alignment of the components is 0.5mm horizontal and 0.2mm vertical, both on each 600m section. Other proposed linear colliders require similar accuracies. These demands can not be fulfilled with common, open-air geodetic methods, mainly because of refra...

متن کامل

Low Precision Neural Networks using Subband Decomposition

— Large-scale deep neural networks (DNN) have been successfully used in a number of tasks from image recognition to natural language processing. They are trained using large training sets on large models, making them computationally and memory intensive. As such, there is much interest in research development for faster training and test time. In this paper, we present a unique approach using l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-20083-0_19